CONCEPT STROKE: Analytical pipeline
Introduction
Definition
CONCEPT STROKE is a study analysing the acute care received by patients with acute ischaemic stroke where the aim is to show the relevance of care pathways on outcomes (traces/trajectories) and efficiency of stroke care.
- Participating regions: Aragón, País Vasco, Cataluña, Navarra and Valencia.
It is a two-stage design:
1- Cross-sectional data mining design
2- Quasi-experimental design comparing interventions in acute ischaemic stroke.
The Main endpoints are:
1- In the first stage, the pathway of care as it occurs in real life and the propensity of a patient to follow a specific pathway (trace).
2- In the second stage, the survival of patients 30 days and 6 months after the admission to an emergency room
Cohort
The cohort is defined as patients admitted to hospital due to acute ischaemic stroke.
Inclusion criteria: Patients aged 18 years or older admitted to the emergency department (or with an unplanned hospital admission) with a principal diagnosis of acute ischaemic stroke during the study period.
Exclusion criteria: Patients aged 17 years or younger; Patients with a diagnosis of acute haemorrhagic stroke or with other non-specific stroke diagnoses.
Study period: 01-01-2010 to most recent data.
Analysis plan
A- Process mining to discover and compare actual care pathways with the theoretical pathway and with those present in the participating regions.
B- Survival analysis to provide prediction of health outcomes within each pathway.
C- Hierarchical generalised additive hierarchical modelling to compare the effectiveness of interventions within pathways.
D- Economic evaluation to measure economic impact and efficiency.
Running code
Descriptive analysis
To study the observed data, we performed a small exploratory analysis of the data. First, we must convert our DataFrame data to an Event Log object. However, one of the drawbacks we may have when creating our Event Log is the granularity of the dates, as hospital dates are usually accurate to the day while emergency dates are usually accurate to the second. Therefore, we need to generate a function to check that they are correct and see if any of them do not make medical sense. We may find errors such as, for example, having emergency and hospital dates on the same day and, as they have different granularity, automatically the hospital date is ordered first or, for example, emergency discharge date is prior to the admission date, among others.
As part of the exploratory analysis it is interesting to know how many different pathways appear and the frequency of each one. This allows us to know what percentage of the pathways are among the most frequent pathways and which pathways are isolated cases.
Survival analysis
We carried out a survival analysis with the 10 most frequent traces, constructing a Kaplan-Meier curve for each of them for comparison. Subsequently, a COX model was performed to compare these pathways and observe the Hazard Ratio (HR) of each pathway compared to the rest.
A- Process Mining
For process mining we created several functions depending on the part of the process mining study, which can be divided into:
Process discovery
Conformance checking
Decision mining
Prediction
Process discovery
Process discovery attempts to find a suitable process model that describes the order of events/activities that are executed during the execution of a process.
The next step after the descriptive analysis was to build a Petri net to discover the process. For this, there are different algorithms: alpha mining, inductive mining or heuristic mining.
However, another type of graph that can be built is the Directly Follows Graph (DFG), which is a graph that, although it can be part of the discovery process, serves as a descriptive analysis as it shows all possible pathways present in the data.
To reduce the high dimensionality, one option is to filter the traces to keep only the k most frequent traces. In this case we filtered by the k=10 most frequent traces.
Conformance checking
Conformance checking is a technique for comparing a process model with an event record of the same process. However, as a first approximation, we made a comparison between the most frequent pathways and the one we established as theoretical by Jaccard similarity without taking into account the order, so that the quotient between is calculated:
Numerator: number of activities that coincide between each of these pathways with the standard.
Denominator: number of activities of the union between each one of these pathways with the regulation.
Decision mining
Decision mining allows us to know what are the main characteristics of patients that make them follow a certain path. To do this, patient characteristics are added to the Event log. Petri net is created using the inductive algorithm and the decision points of the net are observed. We see the importance of the characteristics at one decision point. It is like a decision tree at the decision point. This step may be helpful to know which variables are of importance for input into the prediction model in the next section.
Prediction
To predict which pathway a given patient should follow based on his or her characteristics, we have used the bupaR library, which makes use of a transformer model to predict the pathway as a sequence of activities. Thus, an Event log with features and a transformer model is used to predict the next activity.
B- Estimation of outcomes within a path
A proportional-hazard regression with the scores of propensity of specific paths being the main independent factor in the prediction of survival at 6 and 12 months after admission.
B.1 Kaplan-Meier survival plot
The kaplan-meier survival plot for the 4 intervention possibilities is shown below:
None
Fibrinolysis
Thrombectomy mechanic
Combined (fibrinolyisis + thrombectomy mechanic)
B.2 General COX model
A COX model is built with the survival object (time = survival time; case = exitus) as the dependent variable and the following variables as independent variables:
Categorical: intervention, sex, zip code, hospital, hospital type dicharge, weekday, modified rankin scale, rank trace, holidays, weekend, prescriptions and comorbidities,
Numerical: age, jaccard similarity measure, duration trace, period, number of admission prior emergency, number of admission prior inhospital.
B.3 Propension to intervention model
In order to overcome the violation of proportional risk in COX model, 4 different propensity to intervention models are estimated, using as covariables those that have been found to be significant (pvalue < 0.05) in the general COX model constructed in the previous section.
Propensity to fibrinolysis intervention model,
Propensity to thrombectomy mechanic intervention model,
Propensity to combined intervention model,
Propensity to any intervention model
After building the models, the propensity to intervention for each patient is predicted using each model and the propensity score (PS) is calculated according to the formula: PS_i = 1/ (1- p(y)_i) where p(y) indicates the prediction for each patient i.
B.4 Model to predict exitus with PS as covariable
After calculating the PS for each intervention, a model is constructed for each intervention to predict exitus with the PS of this corresponding intervention as a covariate and finally a general model with the PS of each intervention as covariates and the PS of any intervention as an offset.
Results
These results have been carried out with synthetic data previously generated according to the data model.
Descriptive analysis
First, we imported and converted our DataFrame to an Event Log and check dates.
There are 46 errors in emergency dates, if there are errors they may be dates outside the limits or there may be dates without having entered the emergency
There are 0 errors in hospital dates, if there are errors they may be dates outside the limits or there are dates without having entered the hospital
There are 0 errors in hospital dates with emergency dates, the hospital admission date is between the emergency admission and discharge date (included).
As a descriptive measure of the data, a bar plot with the number of distinct traces and their frequency.
Survival analysis
The kaplan-meier curves for the 10 most common pathways and the COX model summary are shown.
<lifelines.CoxPHFitter: fitted with 133 total observations, 61 right-censored observations>
duration col = 'survival_in_days'
event col = 'status'
penalizer = 0.1
l1 ratio = 0.0
baseline estimation = breslow
number of observations = 133
number of events observed = 72
partial log-likelihood = -324.04
time fit was run = 2023-11-16 11:32:47 UTC
---
coef exp(coef) se(coef) coef lower 95% coef upper 95% exp(coef) lower 95% exp(coef) upper 95%
covariate
trace_1 -0.08 0.93 0.40 -0.86 0.70 0.42 2.02
trace_2 0.13 1.14 0.43 -0.70 0.97 0.49 2.64
trace_3 0.06 1.06 0.43 -0.77 0.90 0.46 2.45
trace_4 0.40 1.49 0.42 -0.43 1.23 0.65 3.41
trace_5 0.09 1.10 0.44 -0.77 0.96 0.46 2.61
trace_6 -0.49 0.61 0.46 -1.39 0.42 0.25 1.52
trace_7 0.10 1.10 0.44 -0.77 0.97 0.46 2.63
trace_8 -0.14 0.87 0.46 -1.03 0.75 0.36 2.13
trace_9 -0.14 0.87 0.48 -1.08 0.79 0.34 2.20
trace_10 0.07 1.07 0.44 -0.80 0.94 0.45 2.56
cmp to z p -log2(p)
covariate
trace_1 0.00 -0.19 0.85 0.24
trace_2 0.00 0.31 0.75 0.41
trace_3 0.00 0.15 0.88 0.18
trace_4 0.00 0.94 0.35 1.53
trace_5 0.00 0.21 0.83 0.27
trace_6 0.00 -1.05 0.29 1.77
trace_7 0.00 0.22 0.83 0.27
trace_8 0.00 -0.31 0.76 0.40
trace_9 0.00 -0.30 0.76 0.39
trace_10 0.00 0.15 0.88 0.19
---
Concordance = 0.58
Partial AIC = 668.07
log-likelihood ratio test = 3.92 on 10 df
-log2(p) of ll-ratio test = 0.07
A- Process mining
Process discovery
Inductive miner traces:
Conformance checking
Comparison of the 10 most frequent traces with the theoretical one, using Jaccard’s similarity method without taking into account the order:
The theorical trace is:
The Jaccard’s similarity measured for k=10 most frequent traces:
Also shown below is the histogram of the Jaccard similarity of the patient traces compared to the theoretical one
Decision mining
First we created a petri net with the inductive algorithm that will allow us to know the different decision points, we can see that the two images below are the same petri net, showing in the first one the decision points and in the second one the activities (transitions). In this section, in order to create a petri net with decision points, it was necessary to delete records from the event log.
After that we created the petri net, we were able to see the importance of the features at decision/s point/s.
In this case, the point/s is/are:
The decision point for fibrinolysis in hospital is: p_7
The decision point for thrombectomy in hospital is: p_7
The decision point for thrombolysis in emergency is: p_19
If the calculation of the importance of the variables in the decision does not appear for any point, it is because the complete decision mining process has not been carried out for that point due to lack of information complexity of the petri net.
Prediction
The used method to make predictions is based on predicting the following activity, using the bupaR tool that makes use of a transformer model. Starting with predicting the next activity, the model’s evaluation is:
loss sparse_categorical_accuracy
0.63 0.77
A plot is also shown that allows a clear view of the accuracy of the model:
B- Estimation of outcomes within a path
B.1 Kaplan-Meier survival plot
The kaplan-meier survival plot for the 4 intervention possibilities is shown below:
B.2 General COX model
A summary (table with all variables and a ggforest with only variables that are statistically significant, pvalue < 0.05 (if it is possible)) of the constructed COX model is shown below:
| surv_obj | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| intervention [fibrinolysis] |
1.37 | 0.98 – 1.90 | 0.065 |
| intervention [thrombectomy_mec] |
1.01 | 0.65 – 1.55 | 0.973 |
| intervention [combined] | 1.07 | 0.78 – 1.47 | 0.674 |
| hospital cd [220020] | 1.50 | 0.80 – 2.81 | 0.202 |
| hospital cd [220036] | 0.63 | 0.31 – 1.30 | 0.211 |
| hospital cd [220041] | 1.31 | 0.70 – 2.48 | 0.399 |
| hospital cd [220054] | 0.97 | 0.53 – 1.77 | 0.908 |
| hospital cd [220089] | 1.15 | 0.54 – 2.42 | 0.721 |
| hospital cd [220105] | 1.37 | 0.75 – 2.50 | 0.303 |
| hospital cd [440012] | 1.28 | 0.70 – 2.35 | 0.427 |
| hospital cd [440027] | 0.64 | 0.33 – 1.25 | 0.189 |
| hospital cd [440033] | 0.73 | 0.34 – 1.56 | 0.416 |
| hospital cd [440048] | 0.77 | 0.41 – 1.44 | 0.412 |
| hospital cd [500016] | 0.68 | 0.33 – 1.41 | 0.306 |
| hospital cd [500021] | 1.18 | 0.63 – 2.20 | 0.608 |
| hospital cd [500055] | 1.12 | 0.61 – 2.08 | 0.710 |
| hospital cd [500068] | 1.26 | 0.66 – 2.41 | 0.488 |
| hospital cd [500074] | 1.01 | 0.55 – 1.87 | 0.966 |
| hospital cd [500080] | 0.79 | 0.39 – 1.63 | 0.528 |
| hospital cd [500093] | 1.40 | 0.72 – 2.73 | 0.324 |
| hospital cd [500107] | 1.06 | 0.55 – 2.02 | 0.867 |
| hospital cd [500114] | 1.30 | 0.69 – 2.43 | 0.417 |
| hospital cd [500129] | 1.04 | 0.54 – 2.02 | 0.906 |
| hospital cd [500135] | 0.87 | 0.44 – 1.72 | 0.692 |
| hospital cd [500140] | 0.82 | 0.40 – 1.69 | 0.594 |
| hospital cd [500153] | 1.22 | 0.66 – 2.27 | 0.531 |
| hospital cd [500172] | 0.79 | 0.39 – 1.63 | 0.529 |
| hospital cd [500188] | 0.68 | 0.34 – 1.36 | 0.277 |
| hospital cd [500195] | 1.06 | 0.56 – 1.98 | 0.866 |
| hospital cd [500200] | 0.96 | 0.51 – 1.80 | 0.901 |
| hospital cd [500218] | 0.67 | 0.32 – 1.41 | 0.294 |
| hospital cd [500223] | 1.38 | 0.74 – 2.57 | 0.316 |
| hospital cd [999999] | 0.75 | 0.39 – 1.42 | 0.375 |
| age nm | 1.00 | 1.00 – 1.01 | 0.080 |
| sex cd [1] | 0.96 | 0.74 – 1.24 | 0.737 |
| sex cd [2] | 1.01 | 0.78 – 1.31 | 0.927 |
| sex cd [9] | 1.12 | 0.87 – 1.44 | 0.377 |
| zip code cd | 1.00 | 1.00 – 1.00 | 0.184 |
| hospital type discharge cd [2] |
1.00 | 0.71 – 1.40 | 0.980 |
| hospital type discharge cd [3] |
0.96 | 0.68 – 1.34 | 0.800 |
| hospital type discharge cd [4] |
1.09 | 0.78 – 1.54 | 0.609 |
| hospital type discharge cd [5] |
1.04 | 0.75 – 1.44 | 0.809 |
| hospital type discharge cd [8] |
0.82 | 0.57 – 1.16 | 0.262 |
| hospital type discharge cd [9] |
1.05 | 0.76 – 1.45 | 0.784 |
| n admission prior emergency nm |
1.00 | 1.00 – 1.00 | 0.944 |
| n admission prior inhospital nm |
1.00 | 1.00 – 1.00 | 0.417 |
| modified rankin scale cd [1] |
0.75 | 0.51 – 1.10 | 0.140 |
| modified rankin scale cd [2] |
0.94 | 0.65 – 1.37 | 0.757 |
| modified rankin scale cd [3] |
0.85 | 0.57 – 1.26 | 0.427 |
| modified rankin scale cd [4] |
0.95 | 0.65 – 1.38 | 0.778 |
| modified rankin scale cd [5] |
1.38 | 0.96 – 1.99 | 0.081 |
| modified rankin scale cd [6] |
0.85 | 0.58 – 1.26 | 0.422 |
| modified rankin scale cd [7] |
0.66 | 0.45 – 0.97 | 0.037 |
| modified rankin scale cd [8] |
1.03 | 0.72 – 1.48 | 0.859 |
| heart failure bl | 0.98 | 0.82 – 1.17 | 0.829 |
| hypertension bl | 0.99 | 0.83 – 1.19 | 0.955 |
| diabetes bl | 1.16 | 0.97 – 1.39 | 0.103 |
| atrial fibrillation bl | 1.16 | 0.97 – 1.39 | 0.115 |
| valvular disease bl | 1.07 | 0.89 – 1.28 | 0.480 |
| rank trace [10] | 0.93 | 0.32 – 2.75 | 0.899 |
| rank trace [2] | 1.18 | 0.41 – 3.38 | 0.756 |
| rank trace [3] | 1.37 | 0.51 – 3.64 | 0.530 |
| rank trace [4] | 1.14 | 0.41 – 3.18 | 0.804 |
| rank trace [5] | 1.16 | 0.39 – 3.44 | 0.785 |
| rank trace [6] | 0.79 | 0.24 – 2.59 | 0.695 |
| rank trace [7] | 0.74 | 0.25 – 2.19 | 0.587 |
| rank trace [8] | 1.01 | 0.35 – 2.88 | 0.986 |
| rank trace [9] | 0.61 | 0.19 – 1.99 | 0.413 |
| rank trace [otros] | 1.13 | 0.55 – 2.31 | 0.739 |
| jaccard similarity | 0.02 | 0.00 – 0.41 | 0.011 |
| dur trace | 1.00 | 1.00 – 1.00 | 0.028 |
| period | 1.00 | 1.00 – 1.00 | 0.974 |
| holiday bl | 1.33 | 0.74 – 2.40 | 0.339 |
| weekend bl | 0.97 | 0.70 – 1.34 | 0.833 |
| weekday [Monday] | 1.20 | 0.86 – 1.68 | 0.278 |
| weekday [Saturday] | 1.09 | 0.78 – 1.54 | 0.603 |
| weekday [Thursday] | 1.10 | 0.80 – 1.52 | 0.569 |
| weekday [Tuesday] | 0.97 | 0.70 – 1.37 | 0.880 |
| weekday [Wednesday] | 0.87 | 0.62 – 1.22 | 0.408 |
| Observations | 1000 | ||
| R2 Nagelkerke | 0.080 | ||
A test to see if proportional hazard assumption is satisfied, can be seen from the overall summary, pvalue < 0.05 indicates that proportional hazard assumption is not satisfied. If the test calculation is not possible, this could occur when there are collinear variables in the COX model, or when there are too few events.
chisq df p
intervention 4.0719 3 0.254
hospital_cd 21.7062 30 0.865
age_nm 2.7951 1 0.095
sex_cd 3.3488 3 0.341
zip_code_cd 1.0276 1 0.311
hospital_type_discharge_cd 88.6355 6 < 2e-16
n_admission_prior_emergency_nm 0.0216 1 0.883
n_admission_prior_inhospital_nm 0.2497 1 0.617
modified_rankin_scale_cd 12.2530 8 0.140
heart_failure_bl 0.1763 1 0.675
hypertension_bl 0.0919 1 0.762
diabetes_bl 0.4710 1 0.493
atrial_fibrillation_bl 0.0114 1 0.915
valvular_disease_bl 3.5051 1 0.061
rank_trace 8.7919 10 0.552
jaccard_similarity 1.1866 1 0.276
dur_trace 78.9327 1 < 2e-16
period 31.4424 1 2.1e-08
holiday_bl 0.2912 1 0.589
weekend_bl 2.2131 1 0.137
weekday 1.8529 5 0.869
GLOBAL 251.6357 79 < 2e-16
B.3 Propension to intervention model
The summary is displayed propensity to fibrinolysis intervention model:
| fibrinolysis intervention bl |
|||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 0.00 | 0.00 – 0.00 | <0.001 |
| modified rankin scale cd [1] |
1.38 | 0.75 – 2.56 | 0.301 |
| modified rankin scale cd [2] |
1.75 | 0.96 – 3.22 | 0.070 |
| modified rankin scale cd [3] |
1.28 | 0.69 – 2.40 | 0.438 |
| modified rankin scale cd [4] |
1.19 | 0.64 – 2.25 | 0.582 |
| modified rankin scale cd [5] |
0.92 | 0.49 – 1.73 | 0.792 |
| modified rankin scale cd [6] |
1.09 | 0.58 – 2.03 | 0.792 |
| modified rankin scale cd [7] |
1.24 | 0.67 – 2.28 | 0.492 |
| modified rankin scale cd [8] |
1.15 | 0.63 – 2.09 | 0.651 |
| jaccard similarity | 563048019785.21 | 8400675095.77 – 44961970582413.91 | <0.001 |
| dur trace | 1.00 | 1.00 – 1.00 | 0.675 |
| Observations | 1000 | ||
| R2 Tjur | 0.197 | ||
The summary is displayed propensity to thrombectomy mechanic intervention model:
| thrombectomy mechanic intervention bl |
|||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 1246152.23 | 50734.79 – 38266721.51 | <0.001 |
| modified rankin scale cd [1] |
1.47 | 0.69 – 3.23 | 0.322 |
| modified rankin scale cd [2] |
0.68 | 0.27 – 1.66 | 0.397 |
| modified rankin scale cd [3] |
0.36 | 0.12 – 0.99 | 0.057 |
| modified rankin scale cd [4] |
0.64 | 0.24 – 1.60 | 0.348 |
| modified rankin scale cd [5] |
0.83 | 0.35 – 1.93 | 0.665 |
| modified rankin scale cd [6] |
0.94 | 0.39 – 2.26 | 0.896 |
| modified rankin scale cd [7] |
0.87 | 0.39 – 1.95 | 0.733 |
| modified rankin scale cd [8] |
0.86 | 0.39 – 1.92 | 0.707 |
| jaccard similarity | 0.00 | 0.00 – 0.00 | <0.001 |
| dur trace | 1.00 | 1.00 – 1.00 | 0.568 |
| Observations | 1000 | ||
| R2 Tjur | 0.107 | ||
The summary is shown propensity to combined intervention model:
| combined intervention bl | |||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 48.27 | 8.98 – 264.74 | <0.001 |
| modified rankin scale cd [1] |
0.97 | 0.56 – 1.68 | 0.905 |
| modified rankin scale cd [2] |
1.07 | 0.61 – 1.85 | 0.821 |
| modified rankin scale cd [3] |
1.22 | 0.69 – 2.14 | 0.494 |
| modified rankin scale cd [4] |
1.17 | 0.66 – 2.06 | 0.593 |
| modified rankin scale cd [5] |
1.31 | 0.75 – 2.27 | 0.341 |
| modified rankin scale cd [6] |
1.02 | 0.58 – 1.79 | 0.948 |
| modified rankin scale cd [7] |
1.11 | 0.65 – 1.91 | 0.698 |
| modified rankin scale cd [8] |
1.10 | 0.65 – 1.87 | 0.720 |
| jaccard similarity | 0.00 | 0.00 – 0.00 | <0.001 |
| dur trace | 1.00 | 1.00 – 1.00 | 0.610 |
| Observations | 1000 | ||
| R2 Tjur | 0.029 | ||
B.4 Model to predict exitus with PS as covariable
| exitus bl | |||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 1.20 | 0.90 – 1.61 | 0.209 |
| ps fibrinolysis | 0.95 | 0.83 – 1.10 | 0.521 |
| Observations | 1000 | ||
| R2 Tjur | 0.000 | ||
| exitus bl | |||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 0.39 | 0.17 – 0.93 | 0.034 |
| ps combined | 1.84 | 1.11 – 3.05 | 0.018 |
| Observations | 1000 | ||
| R2 Tjur | 0.006 | ||
| exitus bl | |||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 0.51 | 0.25 – 1.03 | 0.061 |
| ps thrombectomy mec | 1.94 | 1.07 – 3.54 | 0.029 |
| Observations | 1000 | ||
| R2 Tjur | 0.005 | ||
Finally, a model is built to predict exitus with PS calculated in each of the previous models as covariable and PS any interaction as offset:
| exitus bl | |||
|---|---|---|---|
| Predictors | Odds Ratios | CI | p |
| (Intercept) | 0.00 | 0.00 – 0.00 | <0.001 |
| ps fibrinolysis | 0.24 | 0.16 – 0.35 | <0.001 |
| ps thrombectomy mec | 0.02 | 0.00 – 0.11 | <0.001 |
| ps combined | 48.69 | 10.36 – 232.38 | <0.001 |
| Observations | 1000 | ||
| R2 Tjur | 0.024 | ||